# Visual Question Answering (VQA)
FLODA Deepfake
Apache-2.0
FLODA is an advanced deepfake detection model that integrates image caption generation and authenticity assessment functions, achieving high-precision detection through visual question answering tasks.
Text-to-Image English
F
byh711
113
0
Blip2 Flan T5 Xl Coco
MIT
BLIP-2 is a vision-language model that achieves language-image pretraining by freezing the image encoder and large language model, supporting tasks such as image caption generation and visual question answering.
Image-to-Text
Transformers English

B
Salesforce
2,379
14
Git Base
MIT
GIT is a dual-conditional Transformer decoder based on CLIP image tokens and text tokens, designed for image-to-text generation tasks.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
365.74k
93
Featured Recommended AI Models